An Open-Source Shallow-Transfer Machine Translation Engine for the Romance Languages of Spain
نویسندگان
چکیده
We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology companies. The machine translation architecture uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state based chunking for structural transfer, and is largely based upon that of systems already developed by the Transducens group at the Universitat d'Alacant, such as interNOSTRUM (Spanish—Catalan) and Traductor Universia (Spanish—Portuguese). The possible scope of the project, however, is wider, since it will be possible to use the resulting machine translation system with new pairs of languages; to that end, the project also aims at proposing standard formats to encode the linguistic data needed. This paper briefly describes the machine translation engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine.
منابع مشابه
An open-source shallow-transfer machine translation toolbox: consequences of its release and availability
By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine i...
متن کاملOpen-Source Portuguese-Spanish Machine Translation
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for str...
متن کاملapertium-cy - a collaboratively-developed free RBMT system for Welsh to English
apertium-cy (http://www.cymraeg.org.uk) is a rule-based “gisting” machine translation system forWelsh to English, with both engine and data released under the GPL.We summarise the development of apertium-cy, evaluate its output, and discuss the advantages of a collaborative development model combined with rule-based MT for marginalised languages. 1. e Apertium platform apertium-cy is a “gistin...
متن کاملImproving Machine Translation Between Closely Related Romance Languages
The paper gives an overview of the shallow-transfer MT system Apertium, describes an experiment with the language pair PortugueseSpanish and suggests a modification of the system architecture which leads to higher translation quality. Finally, consequences of the architecture improvement for the design of language resources for shallowtransfer based systems are discussed.
متن کاملAn Open Architecture for Transfer-based Machine Translation between Spanish and Basque
We present the current status of development of an open architecture for the translation from Spanish into Basque. The machine translation architecture uses an open source analyser for Spanish and new modules mainly based on finite-state transducers. The project is integrated in the OpenTrad initiative, a larger governmentfunded project shared among different universities and small companies, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005